Conversation
… A1+A2 → A-with-CI) Per the math-proofs honest assessment (#1383) outstanding-work matrix, the Lean proof of DBSP chain rule (Prop 3.2 + Dop_LTI_commute) was A-grade today hand-run but missing CI gating. Peer reviewers expect "runs in CI" as the line for an A-grade artifact. This PR ships `.github/workflows/lean-proof.yml`: - Path-filter: only runs on `tools/lean4/**` changes (or workflow itself). Out-of-band from gate.yml because Mathlib's lake cache is multi-GB and Lean toolchain setup is heavier than the existing build-and-test surface. - Caches `tools/lean4/.lake` keyed on lean-toolchain + lakefile.toml + lake-manifest.json (busts cleanly on either change). - Caches `~/.elan` with same key shape as gate.yml's elan cache (shared cache space). - Uses `lake exe cache get` to download Mathlib pre-built oleans from S3 — skips the multi-hour Mathlib build. - Type-checks via `lake env lean Lean4/DbspChainRule.lean`. Verified locally: `lake env lean` returns silent + exit 0 (success) on the current proof. Two corrections to #1383 captured in same PR: 1. **Semgrep is already in CI** — original B4 grading was wrong. Verified by grep: `.github/workflows/gate.yml` line ~422 has `lint (semgrep)` job running `semgrep --config .semgrep.yml --error --metrics=off`. Effective grade upgraded to A; outstanding-work item closed. This illustrates the verify-then-claim discipline. 2. **Outstanding-work matrix updated** with Status column tracking each item: "Done" (Semgrep, A4 registry rows #1393, peer-review email #1387), "In flight" (Lean CI — this PR), "open" (Stryker, B1, B2, C-grade items). Composes with verification-registry.md (rows for both A1+A2 benefit) + the verification-drift-auditor skill (every CI fire is implicit drift-check) + #1383 (the assessment doc this upgrades). Workflow safe-pattern compliant per FACTORY-HYGIENE row #43: SHA-pinned actions, contents:read permissions, concurrency group, pinned runs-on, path-filter narrows trigger.
There was a problem hiding this comment.
Pull request overview
This PR adds a dedicated GitHub Actions workflow to type-check the Lean 4 formalization under tools/lean4, moving the chain-rule proof artifacts from hand-run verification toward the repository’s CI-backed verification posture. It also updates the math-proofs assessment document to reflect the corrected Semgrep status and the current outstanding-work matrix.
Changes:
- Adds
.github/workflows/lean-proof.ymlto install the shared toolchain, warm Lean/Mathlib caches, and runlake env leanforDbspChainRule.lean. - Updates the research assessment to reclassify Semgrep as already CI-gated and to add status values to the outstanding-work matrix.
- Records related status changes for verification follow-up items tied to recent PRs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
.github/workflows/lean-proof.yml |
New path-filtered workflow for Lean proof type-checking with cache setup and shared installer usage. |
docs/research/2026-05-03-math-proofs-honest-assessment.md |
Updates Semgrep grading and revises the outstanding-work matrix with status annotations. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5cc5add60
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…pe outstanding-work row Three review threads on #1394 addressed: 1. + 3. **GITHUB_TOKEN missing**: mise's aqua: backend uses the GitHub API for release-tag lookups during install.sh; without GITHUB_TOKEN it hits the unauthenticated 60/hr rate limit and fails. Added workflow-level env: with secrets.GITHUB_TOKEN — same pattern as gate.yml. Inherits contents:read, no escalation. 2. **Outstanding-work row duplicated**: the Peer-review email draft row was inserted twice (Done + empty status). Removed the empty-status duplicate; single Done row remains.
…6-05-03 EOD progress (#1402) Reflects substantive progress this session across the math-proofs honest assessment matrix. Key state changes: **P0 items — 3 of 3 closed:** - Lean lake-build CI job ✓ (#1394) - A4 registry rows ✓ (#1393) - Peer-review email draft ✓ (#1387) - Stryker B3 → partial (config-fix #1395; CI wire deferred to follow-up substantial-design) **P1 items — significant progress:** - Alloy B2 → A ✓ (#1396 — silent-no-op was the failure mode; spec-path fixed) - Semgrep B4 → A ✓ (verify-then-claim correction; was already in CI) - B1 4 deferred specs → 2 of 4 done: - DbspSpec ✓ #1397 (1M states / 11s) - CircuitRegistration ✓ #1401 (B-0180 closed; 3538 states / <1s) - SpineAsyncProtocol B-0179 still open (counterexample inv.) - SpineMergeInvariants B-0181 still open (counterexample inv.) **Sibling work tracked:** - Phase 0 substrate-discovery PoC ✓ (#1392 — 4.0 MB AOT binary on osx-arm64; cross-platform CI matrix) - 3 broken-spec backlog rows filed (#1398 → B-0179 + B-0180 + B-0181); B-0180 closed (#1401) - `.ts/.sh` parity bug in `tools/backlog/generate-index.ts` closed ✓ (#1400 — both generators byte-identical) This update is bounded substrate work documenting the actual state of the matrix; doesn't add new work, just captures completion. Future matrix re-grades happen as work-items land (per the assessment doc's audit-trail discipline). Composes with #1383 (the original assessment) + every PR referenced above. §33 archive-header lint passes.
…ster + cache-clobber discipline encoded (#1408) Substantial multi-tick session shard. 18 PRs touched (#1383 + #1387 + #1392-#1407 inclusive); 14 merged + 4 in-flight as of shard time. **Math-proofs assessment progress** (#1383 outstanding-work matrix): - A1+A2 → A-with-CI ✓ (#1394 Lean lake-build workflow) - A4 registry rows ✓ (#1393) - B1 → 2 of 4 deferred specs in CI ✓ (#1397 DbspSpec + #1401 CircuitRegistration B-0180 closed) - B2 Alloy → A ✓ (#1396 silent-no-op spec-path fix) - B4 Semgrep → A ✓ (correction) - Peer-review email template ✓ (#1387) - Phase 0 substrate-discovery PoC ✓ (#1392) - Stryker config-fix ✓ (#1395; CI wire deferred) - 3 broken-spec backlog rows filed ✓ (#1398) **Cache-clobber silent-bug class discovered + fully encoded:** B-0180 fix passing locally + failing CI → verify-then-claim identified gate.yml + low-memory.yml caching whole tools/tla and tools/alloy directories. Fix cluster: #1403 (gate.yml) + #1404 (low-memory.yml + audit-ci-cache-paths.ts) + #1406 (CI lint gate) + #1407 (memory file + bug-locus disambiguation per Aaron's 'real github bug?' question — answer: usage-bug, not tool-bug). **Other substrate work:** #1399 BACKLOG.md regen, #1400 .ts/.sh parity bug, #1402 assessment matrix doc update, #1405 B-0182 backlog row (Linux-only formal verification — orthogonal-axes split per Aaron 2026-05-03). **Discipline lessons captured:** chat-is-assertion-channel, substrate-corrections-cluster, search-first-before-architectural- expansion, verify-then-claim CI fidelity, documentation-is- current-state-not-history. Carved sentence: 'When a lucky catch surfaces a class of bug, build the structural fix that eliminates the luck — audit + lint gate + carved-sentence rule + memory file.'
Summary
P0 outstanding item from #1383 math-proofs honest assessment. Adds
.github/workflows/lean-proof.ymlto type-check the Lean 4 / Mathlib formalization (tools/lean4/Lean4/DbspChainRule.lean) on every relevant PR + push to main.What this closes
Peer reviewers expect "runs in CI" as the line for an A-grade artifact. This workflow is the gate.
Workflow design
tools/lean4/**changes (out-of-band from gate.yml because Mathlib cache is multi-GB)tools/lean4/.lakekeyed on lean-toolchain + lakefile.toml + lake-manifest.json~/.elanwith same key shape as gate.yml's elan cachelake exe cache getdownloads Mathlib pre-built oleans from S3 (skips multi-hour build)lake env lean Lean4/DbspChainRule.leantype-checks the proofVerified locally:
lake env leanreturns silent + exit 0 on current proof.Two corrections to #1383 (verify-then-claim sweep)
Semgrep is already in CI — original B4 grading was wrong. Verified
gate.ymlline ~422 haslint (semgrep)running with--error --metrics=off. Effective grade upgraded to A; outstanding-work item closed.Outstanding-work matrix updated with Status column: "Done" (Semgrep + A4 registry rows docs(research): registry rows for the 5 in-CI TLA+ specs (closes A4 external-citation-fidelity gap) #1393 + peer-review email docs(research): peer-review outreach email template + recipient ranking 2026-05-03 #1387), "In flight" (Lean CI — this PR), "open" (Stryker + B1 + B2 + C-grade).
Test plan
lake env leanverified locally (silent + exit 0)🤖 Generated with Claude Code